Improving Sentiment Analysis with Document-Level Semantic Relationships from Rhetoric Discourse Structures

نویسندگان

  • Joscha Märkle-Huß
  • Stefan Feuerriegel
  • Helmut Prendinger
چکیده

Conventional sentiment analysis usually neglects semantic information between (sub-)clauses, as it merely implements so-called bag-of-words approaches, where the sentiment of individual words is aggregated independently of the document structure. Instead, we advance sentiment analysis by the use of rhetoric structure theory (RST), which provides a hierarchical representation of texts at document level. For this purpose, texts are split into elementary discourse units (EDU). These EDUs span a hierarchical structure in the form of a binary tree, where the branches are labeled according to their semantic discourse. Accordingly, this paper proposes a novel combination of weighting and grid search to aggregate sentiment scores from the RST tree, as well as feature engineering for machine learning. We apply our algorithms to the especially hard task of predicting stock returns subsequent to financial disclosures. As a result, machine learning improves the balanced accuracy by 8.6 percent compared to the baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sentiment analysis based on rhetorical structure theory: Learning deep neural networks from discourse trees

Prominent applications of sentiment analysis are countless, including areas such as marketing, customer service and communication. The conventional bag-ofwords approach for measuring sentiment merely counts term frequencies; however, it neglects the position of the terms within the discourse. As a remedy, we thus develop a discourse-aware method that builds upon the discourse structure of docum...

متن کامل

Better Document-level Sentiment Analysis from RST Discourse Parsing

Discourse structure is the hidden link between surface features and document-level properties, such as sentiment polarity. We show that the discourse analyses produced by Rhetorical Structure Theory (RST) parsers can improve document-level sentiment analysis, via composition of local information up the discourse tree. First, we show that reweighting discourse units according to their position i...

متن کامل

DFDS: A Domain-Independent Framework for Document-Level Sentiment Analysis Based on RST

Document-level sentiment analysis is among the most popular research fields of nature language processing in recent years, in which one of major challenges is that discourse structural information can be hardly captured by existing approaches. In this paper, a domain-independent framework for documentlevel sentiment classification with weighting rules based on Rhetorical Structure Theory is pro...

متن کامل

Improving a Pipeline Architecture for Shallow Discourse Parsing

We present a system that implements an end-to-end discourse parser. The system uses a pipeline architecture with seven stages: preprocessing, recognizing explicit connectives, identifying argument positions, identifying and labeling arguments, classifying explicit and implicit connectives, and identifying attribution structures. The discourse structure of a document is inferred based on these c...

متن کامل

Discourse Connectors for Latent Subjectivity in Sentiment Analysis

Document-level sentiment analysis can benefit from fine-grained subjectivity, so that sentiment polarity judgments are based on the relevant parts of the document. While finegrained subjectivity annotations are rarely available, encouraging results have been obtained by modeling subjectivity as a latent variable. However, latent variable models fail to capitalize on our linguistic knowledge abo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017